Symbolic Representation of Time Series: A Hierarchical Coclustering Formalization
نویسندگان
چکیده
The choice of an appropriate representation remains crucial for mining time series, particularly to reach a good trade-o between the dimensionality reduction and the stored information. Symbolic representations constitute a simple way of reducing the dimensionality by turning time series into sequences of symbols. SAXO is a data-driven symbolic representation of time series which encodes typical distributions of data points. This approach was rst introduced as a heuristic algorithm based on a regularized coclustering approach. The main contribution of this article is to formalize SAXO as a hierarchical coclustering approach. The search for the best symbolic representation given the data is turned into a model selection problem. Comparative experiments demonstrate the bene t of the new formalization, which results in representations that drastically improve the compression of data.
منابع مشابه
Bayesian coclustering of Anopheles gene expression time series: study of immune defense response to multiple experimental challenges.
We present a method for Bayesian model-based hierarchical coclustering of gene expression data and use it to study the temporal transcription responses of an Anopheles gambiae cell line upon challenge with multiple microbial elicitors. The method fits statistical regression models to the gene expression time series for each experiment and performs coclustering on the genes by optimizing a joint...
متن کاملFeature Extraction over Multiple Representations for Time Series Classification
We suggest a simple yet effective and parameter-free feature construction process for time series classification. Our process is decomposed in three steps: (i) we transform original data into several simple representations; (ii) on each representation, we apply a coclustering method; (iii) we use coclustering results to build new features for time series. It results in a new transactional (i.e....
متن کاملA Symbolic Representation of Time Series Employing Key-Sequences and a Hierarchical Approach
Efficiently and accurately searching for similarities among time series and discovering interesting patterns is an important and non-trivial problem. There is a lot of prior work e.g., F-index introduced by Agrawal et al, STindex proposed by Faloutsos et al, and PAA suggested by Keogh et al. In this paper we suggest a new method: HFVQA (Hierarchical Frequency-based Vector Quantized Approximatio...
متن کاملCats & Co: Categorical Time Series Coclustering
We suggest a novel method of clustering and exploratory analysis of temporal event sequences data (also known as categorical time series) based on three-dimensional data grid models. A data set of temporal event sequences can be represented as a data set of three-dimensional points, each point is defined by three variables: a sequence identifier, a time value and an event value. Instantiating d...
متن کاملAlgorithms for Segmenting Time Series
As with most computer science problems, representation of the data is the key to ecient and eective solutions. Piecewise linear representation has been used for the representation of the data. This representation has been used by various researchers to support clustering, classication, indexing and association rule mining of time series data. A variety of algorithms have been proposed to obtain...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015